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Abstract 



We investigate the mutation-selection dynamics for an evolutionary computation model based on Tur- 
pq ^ ing Machines that we introduced in a previous article [1 J. 

p ^ I The use of Turing Machines allows for very simple mechanisms of code growth and code activa- 

tion/inactivation through point mutations. To any value of the point mutation probability corresponds 
a maximum amount of active code that can be maintained by selection and the Turing machines that 
. reach it are said to be at the error threshold. Simulations with our model show that the Turing machines 

qh| population evolve towards the error threshold. 

Mathematical descriptions of the model point out that this behaviour is due more to the mutation- 
selection dynamics than to the intrinsic nature of the Turing machines. This indicates that this result is 
J> ' much more general than the model considered here and could play a role also in biological evolution. 

■ 

^ ; 1 Introduction 

The study of "in silico" evolutionary models has increased significantly in recent times, see Il2l,|l3l,|l4l,||5l,||6l,||2l, 
m, ID just to give some examples. The basic idea behind these models is to simulate the evolution of computer 
algorithms subject to mutation and selection procedures. In this artificial evolution setting, the algorithms play 
the role of the biological organisms and they are selected on the basis of their ability in performing one or more 
prescribed tasks (replicate themselves, compute some mathematical function, etc.). While the simulated algorithms 
. have clearly an incomparably lesser degree of complexity than a whatever biological organism, the hope is that (at 

^ ' least some of) the phenomena observed in the digital evolution model could correspond to general behaviours of 

evolutionary systems. Indeed, it seems that this is what happens in some cases: emergence of parasitism in Q, 
quasi-species selection in I4j and the striking similarity between the C-value enigma Q and the phenomenon of 
code-bloat in evolutionary programming ifTOl . |[T|. 

One of the motivations for performing artificial evolution experiments is the continuously increasing computa- 
tional power of modern computers. Nowadays, very fast multiprocessor computers have relatively low prices and 
many scientific institutions have at their disposal large facilities for parallel computation. For example, one run 
lasting 50000 generations of a population of 300 Turing machines (TMs) of our evolutionary model lasts about 
half a day per processor on an ordinary home computer (for the higher value of the states-increase rate pi, for lower 
values it lasts considerably less). The long term evolution experiment on E. coli directed by R.E. Lenski reached 
the 40000 generations after almost 20 years [iTI (however, the population considered in this experiment is much 
larger, of the order of 10^ cells). When population size is not a crucial parameter, digital evolution experiments 
can explore a number of generations inaccessible to laboratory experiments with real organisms. If one wants to 
study evolutionary effects on a so large time scale in real biological organisms, then has to resort to paleontological 
studies. However, such studies are vexed by the incompleteness of the fossil record and by the unrepeatability of 
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the experiments. Indeed, repeatability allows to discriminate easily among effects due to adaptation and those 
simply due to drift. These problems are overcame in laboratory experiments such as Lenski one, but at the price 
of reducing the environment to a Petri dish. Artificial evolution experiments allow to explore larger time scales 
than laboratory experiments at much reduced costs, but at the higher price of replacing biological organisms with 
algorithms. By the way, there is another big advantage when performing artificial evolution experiments, namely 
the complete control over all the experimental settings. This gives the opportunity to use a reductionistic approach, 
by studying separately the effects of the various mechanisms involved in the evolutionary dynamics, something 
that is very difficult to obtain when working with real organisms. Finally, as a last argument in favour of artificial 
evolution experiments, we cite one given by Maynard Smith |fT2|: "...we badly need a comparative biology. So far, 
we have been able to study only one evolving system and we cannot wait for interstellar flight to provide us with a 
second. If we want to discover generalizations about evolving systems, we will have to look at artificial ones." 

As we said, even the most complicated computer algorithm is incomparably simpler than a whatever biological 
organism. Moreover, typical artificial evolution experiments have a unique ecological niche and the interaction 
between the artificial organisms is often limited to the comparison of their performances. So, a very big distance 
separates artificial evolution experiments from biological evolution. For this reason, many biologists are skeptical 
on the biological relevance of the results obtained in the digital framework; for example, some objections typically 
raised are reported in ||T3| . On the other hand, supporters of artificial evolution experiments reply that the observed 
results can actually be general phenomena of evolutive systems, therefore being independent from the particular 
model under consideration. To test this hypothesis it would be nice to compare the results obtained in the artificial 
evolution setting with real biological data, but this is very hard to do for long-term evolutionary effects, that is 
where artificial evolution models are most useful. On the other hand, general evolutionary behaviours do emerge 
if the mutation-selection dynamics have a prominent role on the peculiar characteristics of the evolving organism. 
When this is the case, the observed effects can be reproduced through a population genetic mathematical model. 
Indeed, these models center on the dynamics induced by the selection and mutation operators (under some work 
hypotheses), more than in the specific details of functioning of the organism. If successful, this procedure extends 
the validity of the results observed in the evolutionary model under consideration to all the evolutionary models 
working with the same mutation and selection operators (under the same hypotheses). This means that the problem 
of the biological relevance of the results obtained in the artificial evolution experiment is switched to the problem 
of assessing the biological likelihood of the mutation and selection operators and of the hypotheses used in the 
mathematical model. 

In this paper we apply this strategy, so that we derive a deterministic and a stochastic population genetic model 
of our evolutionary model for TMs. Our main aim is to show that the evolutionary dynamics pushes the TMs 
toward the error threshold |T4|, The population genetic model is used to compute mathematically the value of the 
error threshold and to show that this dynamical behaviour is due to quite mild hypotheses 

According to this program, in the Materials and Methods section we first briefly recall our evolutionary model 
for TMs m and the Eigen error threshold concept. A deterministic population genetic model for our digital 
evolution model is introduced in the third subsection "The deterministic model", while its stochastic counterpart 
(limited to the evolution of the best performing TMs) is given in the fourth one. In the Results section we report the 
results obtained by the computer simulations and compare them with those predicted by the mathematical models. 
Finally, our concluding remarks are given in the Discussion section. 

2 Materials and Methods 
2.1 The evolutionary model 

We basically use the same evolutionary programming model based on Turing Machines that has been introduced 
in IT]. The following are the only differences between that model and the model we use in this article: 

1. the TMs' movable head can move only right or left, now it cannot stay still (this also affects the definition of 
the added state); 

2. the TMs' tape is now circular, so that the TM head cannot exit from the tape; 

The first choice allows us to save one bit of memory for each state of the TMs and, at the same time, makes our 
definition more similar to the original one IfTSi . The second choice seems to us the most convenient when dealing 
with finite tapes. 

To the sake of making this article self-contained, we give a terse description of Turing machines and of the 
evolutionary programming model that we use. 
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Turing Machines are very simple symbol-manipulating devices which can be used to encode any feasible 
algorithm. They were invented in 1936 by Alan Turing ifTSl and used as abstract tools to investigate the problem 
of functions computability. For a complete treatment of this subject we refer to lfT6ll . 

A Turing machine consists of a movable head acting on an infinite tape T{t), see figure[T] The tape consists of 
discrete cells that can contain a or a 1 symbol. The head has a finite number of internal states that we denote by 
N (in which case the TM is called an N-state TM). At any time t the head is in a given internal state s{t) and it is 
located upon a single cell k{t) of the infinite tape T{t). It reads the symbol stored inside the cell and, according to 
its internal state and the symbol read, performs three actions: 

1. "write": writes a new symbol on the k{t) cell iT{t) t-^ T{t + 1)), 

2. "move": moves one cell on the right or on the left ^ k{t + 1)), 

3. "call": changes its internal state to a new state (s(t) i-^ s{t + 1)). 

Accordingly, a state can be specified by two triplets "write-move-call" listing the actions to undertake after reading 
respectively a or 1 symbol. There exists a distinguished state (the Halt state) that stops the machine when called. 
The initial tape T{0) is the input tape of the TM, and the tape T{t) at the instant i when the machine stops is 
its output tape, that is the result of executing the algorithm defined by the given TM on the input tape r(0). 
However, many TMs will never stop, so that they will not be associated with any algorithm. Moreover, the halting 
problem, that is the problem of establishing if a TM will eventually stop when provided with a given input tape, is 
undecidable. This means that there will exist TMs for which it is impossible to predict if they will eventually halt 
or not for a given input tape. 

We have to introduce some restrictions on the definition of the TMs in our evolutionary model. Since we want 
to perform computer simulations, we need to use a tape of finite length that we fix to 300 cells. The position of the 
head is taken modulo the length of the tape, that is we consider a circular tape with cell 1 coming next cell 300. 
Since it is quite easy to generate machines that run forever, we also need to fix a maximum number of time steps, 
therefore we choose to force halting the machine if it reaches 4000 steps. 

We begin with a population of 300 1 -state TMs of the following form 





1 





— movel — Halt 


1 


1 - move2 - Halt 



(1) 



where movel and move2 are fixed at random as Right or Left, and let them evolve for 50000 generations. At each 
generation every TM undergoes the following three processes (in this order): 

1. states-increase, 

2. mutation, 

3. selection and reproduction. 

States-increase. In this phase, further states are added to the TM with a rate p-,. The new states are the same as 
([T]l with the 1 label replaced by N + 1, N being the number of states before the addition. While it is clear that 
the states-increase should be considered a form of mutation (vaguely resembling insertion), we preferred to keep 
it distinguished because its effect is always neutral. 

Mutation. During mutation, all entries of each state of the TM are randomly changed with probability p,,,. The 
new entry is randomly chosen among all corresponding permitted values excluding the original one. The permitted 
values are: 

• or 1 for the "write" entries; 

• Right, Left for the "move" entries; 

• The Halt state or an integer from 1 to the number of states N of the machine for the "call" entries. 
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Selection and reproduction. In the selection and reproduction phase a new population is created from the actual 
one (old population). The number of offspring of a TM is determined by its "performance" and, to a minor extent, 
by chance. Actually, in the field of evolutionary programming the word used is fitness. However, in population 
genetics, fitness is used to denote the expected number of offspring (or the fraction that reach the reproductive age) 
of an individual. To avoid ambiguities, we decided to reserve the word fitness for this meaning, and to use the 
word performance for the evolutionary programming one. The performance of a TM is a function that measures 
how well the output tape of the machine reproduces a given "goal" tape starting from a prescribed input tape. We 
compute it in the following way. The performance is initially set to zero. Then the output tape and the goal tape 
are compared cell by cell. The performance is increased by one for any 1 on the output tape that has a matching 1 
on the goal tape and it is decreased by 3 for any 1 on the output tape that matches a on the goal tape. 

As a selection process, we use what in the field of evolutionary algorithms is known as "tournament selection 
of size 2 without replacement". Namely two TMs are randomly extracted from the old population, they run on the 
input tape and a performance value is assigned to them according to their output tapes. The performance values are 
compared and the machine which scores higher creates two copies of itself in the new population, while the other 
one is eliminated (asexual reproduction). If the performance values are equal, each TM creates a copy of itself 
in the new population. The two TMs that were chosen for the tournament are eliminated from the old population 
(namely they are not replaced) and the process restarts until the exhaustion of the old population. Notice that 
this selection procedure keeps the total population size N (with N an even number) constant. From our point 
of view this selection mechanism has two main advantages: it is computationally fast and quite simple to treat 
mathematically. 

The choice of TMs to encode the algorithms in our evolutionary model was convenient for various reasons. The 
first reason is that any feasible algorithm can be encoded through a TM (Church-Turing thesis lfT6l ): so that TMs 
are universal objects inside the algorithms class. The second reason is that even TMs with a very low number of 
states can exhibit a very complicated and unpredictable behaviour (even if the input tape is filled only with zeroes 
as in our case, see for example the busy beaver function 1171 ). Thanks to this property, it is very difficult to predict 
the dynamics of our evolutionary model. 

While developing the model, we were primarily interested in how the variations in the length of the code affect 
the evolutionary dynamics. From this point of view, the TMs present many advantages. The distinction between 
coding and non-coding triplets is unambiguous and very easy to verify. We define a triplet as non-coding (with 
respect to a given input tape) if mutations in its entries cannot affect the output tape of the TM and we will call it 
coding in the complementary case. This definition is practically equivalent to saying that a triplet is coding if it is 
executed at least one time when the TM runs on a given input tape and it is non-coding if it is never executed. In 
this way, to identify coding triplets, one has only to run the TM on the prescribed input tape and mark the triplets 
that are executed. Another advantage is that the mechanism of state-adding is completely neutral; the added states 
are always non-coding, so that they cannot change the performance of the TM. On the other hand, there is a simple 
mechanism of code activation. Namely, a triplet of a non-coding state s can be activated, for example, when a 
mutation occurs in the call entry of a coding state changing its value to s, but notice that also mutations in the 
write and move entries (of a coding triplet), can result in an activation or inactivation of the TMs triplets. Finally, 
another advantage of using TMs is that they are specified in terms of an atomic instruction: the state. 

2.2 The Eigen's error threshold 

The error threshold concept was introduced in 1971 by Eigen in the context of its quasispecies model lfT4l . ifTSl . 
The model describes the dynamics of a population of self-replicating polynucleotides of fixed length L, subject 
to mutation and under the constraint of constant population size. Each polynucleotide j'*' is characterized by its 
replication rate Ai, its degradation rate Di and the probabilities Qji of mutating into a different polynucleotide j'^' 
as a consequence of an inexact replication. All these parameters are assumed to be fixed numbers, independent of 
time and of population composition. The Eigen model then consists of a set of ODEs determining the evolution of 
the frequency (pi of the polynucleotides in the total population: 



where the sum is over all possible polynucleotide templates /^-'^ It is supposed that the polynucleotide I^^^ has 
a larger fitness than the others Ai — Di > Ak — Dk, k > 1. Such polynucleotide is usually called the master 
sequence while the others are called mutants. If we assume that mutation is exclusively due to point mutation, 
we neglect transversions, suppose that transitions have all the same probabilities of occurring and that the point 




(2) 
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mutation probability is independent on the site, then we can identify our polynucleotides as binary chains of length 
L and the mutation probabilities Qji depend only on the point mutation probability q and the Hamming distance 
d{i,j) among the binary chain and the binary chain 

Once assigned the Aj and Dj parameters, one can study the asymptotic composition of the population as a function 
of the point mutation probability q. It turns out that (at least for some choices of the fitness landscape, see [19], 
II20I . ET\ . Il22l . 1231 ) there is a sharp transition in the population composition near a particular value of q that is 
termed error threshold. Before the error threshold, the population is organized as a cloud of mutants surrounding 
the master sequence, while, after the error threshold, each polynucleotide is almost equally represented. In the 
thermodynamic limit (when the chain length L goes to infinity and the point mutation q goes to zero in such a 
way that the genomic mutation rate p = qL stays finite) this is a real phase transition of first order 1241 . and 
the error threshold is mathematically well defined. As a consequence, from this model it follows that natural 
selection can preserve the genome informative content only if the mutation rate is lower than the error threshold 
(see II25I ): after the error threshold, all the information content is lost. For a single peak fitness landscape (i.e. 
Ak — Dk = A2 — D2, k > 2) and in the thermodynamic limit, the system of equations (|2]i can be decoupled into a 
two by two system by introducing a collective variable (f)M for the overall frequency of mutants in the population 

00 

k=2 

In the thermodynamic limit, the fidelity rate of the master sequence will be given by Qn = e~P and the probability 
of back mutation Qim will go to zero. So, the Eigen equations take the form: 

01 = {Aie-P^Di)(bi-(bi[{Ai-Di)cj}i + {A2-D2)<j)M], (3) 
<pM = Ai {1 - e-P) (f>i + {A2 - D2)c^M - (t>M [{Ai - Di) c^i + {A2 - D2) c^m] . (4) 

The error threshold P, in this case, will coincide with the lowest value of the mutation probability P = 1 — e^P 
for which the master sequence goes extinct in the asymptotic limit t — > 00 |26l. Using the constraint (j>i + (pM = 1 
in equation (O we get a closed equation for 0i that gives: 

^ {A,-D,)-{A2-D2) 
Ai ■ 

Observe that the infinite population limit has the effect of removing the genetic drift and that the survival of 
the master sequence in the asymptotic limit for values of the mutation probability less than the error threshold is 
possible only for infinite populations. For finite populations, when the probability of reverse mutations is zero, 
the genetic drift will always push the population in its only absorbing state: the extinction of the master sequence. 
In the finite population case, however, the expected number of generations before the extinction of the master 
sequence will start to grow by several orders of magnitude when the mutation probability drops below the error 
threshold (see l27l . l28l ). This is the reason why the value of the error threshold predicted by the deterministic 
model works also for finite populations. This effect is also present in our model (see the next two sections and figure 
[13. 

2.3 The deterministic model 

In this section we will describe a deterministic mutation-selection model with tournament selection of rank two 
and we will obtain the corresponding error threshold. Our model of selection and reproduction is very different 
from that of the Eigen model. In particular, the number of offspring is not constant for each genotype, since while 
the performance landscape is fixed, the fitness landscape changes in time following the changes in the population 
composition. Despite this fact, when one neglects the probability of back mutations, one obtains a closed equation 
for the number of individuals with the best performance, as it happens for the Eigen model with the single peak 
fitness landscape. Following the example of the Eigen model (see also l29ll ). we will define the error threshold as 
the value of the mutation probability that causes the extinction of the master sequence (that, in our case, is the best 
performance class) for our selection model considered in the deterministic limit. 

Let us suppose that we have M possible performance classes, a population of size N, and let us denote with rii 
the number of individuals belonging to the ith performance class. In the selection step we draw 2 individuals from 
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the population without replacement and compare their performances. The individual with higher performance is 
copied into the new population and has a probability / to give raise to another copy, while, with probability 1 — / 
the second copy will belong to the individual with the lower performance. When the two individuals have the 
same performance, then both are passed to the new population. The two individuals are eliminated from the old 
population and the process is restarted until the old population is exhausted and the new population is replenished. 
Notice that it must hold < / < 1- The selection mechanism we used in our TMs model correspond to the 
particular choice / = 1. 

With this mechanism, each individual belonging to rii has a probability 



of making two copies of itself, a probability 



N - 1 



Pi = 



N 



+ni - 1 + (1 - D^rij 



3>i 



of making one copy of itself, and, finally, a probability 



j>i 



of making no copy at all. 

It follows that the expected number n[ of individuals in the ith performance class after selection is given by: 



iV- 1 



3>i 



(6) 



Notice that it holds: 



M 

i=l 

Now, let us consider the mutation step. We assume that the individuals in each performance class i share the 
same probability Qi of undergoing neutral mutations only, or no mutations at all. We will call Qi, with a slight 
abuse of terminology, the fidelity rate of the ith performance class. Let us denote with gij the probability that 
an individual in the jth performance class gives raise to an individual in the ith performance class as a result of 
a mutation {ga = since we included the neutral mutations in the fidelity rate Qi. Obviously, we could define 
Qi as the probability of undergoing no mutations at all and ga as the probability for the intervening mutations of 
being neutral. However, even if the alternative chosen in the text could seem clumsier it is better suited for our 
mathematical analysis.). This mutation mechanism gives raise to the following deterministic discrete equation: 



M 



^(1 - QjW^gij. 



(7) 



Notice that, since by definition. 



M 



it will also hold 



M 

1=1 



= N. 



Suppose now that gij ^ 1 if i > j, namely that the probability of a mutation to a higher performance class is 
very small, then the fraction of individuals undergoing a beneficial mutation in one generation is negligible. Let s 
be the best occupied performance class at a given time n,, > 0, rii ^ Q, i > s, and suppose 1 < s < M. From 
equation we get that it also holds n'^ > 0, n'^ — 0, i > s. Then, from O and (|6]l we get: 



n'^Qs = UsQs 




nsQs 



fjN-ris) 
N-1 



(8) 
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The best performance class is stably populated if n'!. 



,(1) 



and the second by 



(2") 

Tis is greater than zero if 



,(2) ^ L 



Qs > 



iV(l + /)-l 



We have two solutions. The first one is given by 

N -V 



(9) 



1 + / 



N-l 



1 + / 



o 



and in such a case Ug = nf' is a sink and = an unstable equilibrium, since the function n'g — Us is positive 



for Us £ (0, 71s ) and negative for Ug £ {ris ,N), as shown in figure|2] If 

^ 1+ f^' 

then there is only a sink in Ug ~ 0. Hence, the error threshold is given by 

0=l-P ^ 



1 + 

^ J N-l 



Neglecting the 0{1/N) corrections, we obtain; 



P = 



1 + / 



(10) 



(11) 



This is the same result that one gets from the Eigen model when considering the single peak fitness landscape with 
Ai = {I + f){A2 - D2 + Di) (see equation © and also 1261 ). 

With the previous argument we have shown that, after infinitely many generations, the best occupied perfor- 
mance class must satisfy Qs > Q namely that nj ~ for all j such that Qj < Q. We can now show that actually 
the index s is actually the largest possible one, namely that there is no class i such that Qs > Qi > Q. Indeed, 
let us suppose that gi+i^i ^ OVi = 1, . . . , M — 1, then if at a given generation, the ith performance class is 
populated, while the i + 1th is empty, then at the following generation we will have 



= ^{i-Qi) 92+1,1 m 
i<i 

> (1 - Qi) gi+i.i n. 



\J<1 3>l J 



N - 



'j<i 



> 



(12) 



So, a fraction of the population (possibly very small) will filtrate progressively into higher performance classes. 
This process will continue until the last performance class M or a performance class s such that Qs > Q, Qi < Q 
if 7 > s will be reached. Then the asymptotic occupation number of this class will be given by equation (O. A 
certain number of observations about this result are in order First, let us notice that according to equation ( fT2l i. the 
s + 1th performance class will be populated at each generation by mutants of the sth one. As we said, if gs+i,s is 
small, this number will be a tiny fraction of Us and we can neglect it (as we did in equation ([8])). So, what we have 
really shown is that the sth performance class is the last one that will have a significative occupation number. The 
actual value of this number will depend mainly on how near Qs is to Q. 

The second argument to keep into account is that the time to populate the sth class could be astronomical 
and will depend on the values of gij, Qi and /. In particular, to keep it reasonable, and / must not be 

exceedingly small. It is also necessary that for i < s the fidelity rates Qi are not smaller than or too near to Q. A 
natural assumption avoiding this occurrence is that Qi is a monotonically decreasing function of i. 

As an illustrative example, we show in figure|3]the results of a numerical simulation of the discrete system Q, 
dTll with the following choices of the parameters: M = 40, N = 100, / = 10^^, 



9i3 




if 7 = J = 1 or — 7 = 
if 7 = j = 40 or 7 — 
otherwise, 



1 
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Q, -(1-10-5)^, z = l,...,40. 



(13) 



and with all the 100 individuals in the first performance class as an initial state. Needless to say that this choice of 
the parameters does not pretend to have any degree of biological realism. The green points show the occupation 
numbers of the 40 performance classes obtained after 2 • 10^ generations, while the red line connect those obtained 
after 10^ generations. The maximum difference between the occupation numbers of the same performance class at 
2 • 10® and 10® generations is 5 • 10^^ and, consequently, cannot be detected. This means that the population had 
almost reached its stable state after 10® generations. The error catastrophe does occur when the fidelity rate ( fT3b 
is less than the fidelity threshold ( fTOb . With the above choices, we have Qi > Q for i = 1, . . . , 21 and Qi < Q for 
i — 22, . . . , 40. According to the above, we expect that, in the asymptotic limit, the performance classes from the 
22nd on, should be empty, while equation (|9j predicts that the 21st one should be occupied by 4.682 individuals. At 
the end of the simulation, the number of individuals in the 21st performance class is 4.686 while those in the 22nd 
are 2.02 • 10"^ and they go progressively decreasing, by approximately 5 orders of magnitude per performance 
class, while the performance class increases. 



The TMs critical number of coding states We made two hypotheses in our deterministic model to find the 
value of the error threshold (fTTT i. The first hypothesis is that gij ^ 1 if i > j, that is, that favorable mutations 
are extremely rare. This is a natural assumption in our TMs model, because very often the mutations induce a 
big change in the output tape that have a very small probability of being favorable. Moreover, in the next section, 
we will develop a stochastic model based on the same assumption, and we will see (figure ( fT2b ) that there is 
good agreement between the prediction of this model and the observed results. The second relevant assumption to 
compute the error threshold (fTTl i is that the individuals belonging to the best performance class s have the same 
fidelity rate Qg. We will make the further assumption that, for TMs, the probability that a mutation occurring in a 
coding triplet is neutral is also negligible. Then, the fidelity rate of a TM with Nc coding triplets is given by 

Q=(l-Pn,f^% (14) 

since mutations occurring in non-coding triplets are, by definition, neutral. It follows that, for a given value of 
Pm, the fidelity rate is determined only by the number of coding triplets of the TM. The assumption that the 
best performing TMs have the same fidelity rate is therefore equivalent to the assumption that they have the same 
number of coding triplets. Figure |4] shows that this assumption is very near to the truth when considered for a 
given run (that is what we really need). However, notice that the relation among s and Qs varies considerably 
among different runs. This is particularly evident in figure|5] where the number of coding triplets associated with 
the performance scores of 47 and 48 exhibits a more than two-fold variation. 

Having established that the hypotheses under which we have obtained the error threshold (fTTl i are accurate for 
our model, we can use it to determine the maximum allowed number of coding triplets for a TM. In the case of our 
model, / = 1, so that equation (fTTT ) give us the error threshold at P = 1/2. The mutation probability for a TM 
with iVc coding triplets is given by: 

P=l-(l-p„)3^'. (15) 
By equating (fTSll to the error threshold, we get the critical number of coding states for the TMs; 

ln(2) 
31n(l -pn,) 

This expression is represented by the thick black line in figure|6l The ultimate fate of TMs with a number of coding 
states larger than N*, according to our deterministic model, will be the extinction. 



2.4 The stochastic model 

In this section we will keep into account the stochastic effects in our mutation and selection procedures. 

We recall that the constant population size N must be an even number We will introduce a stochastic model 
for the evolution of only the number Ug of individuals with the best performance value. Let us consider separately 
the selection and mutation steps. 



The selection step Since we are interested in the evolution of the number Us of the best individuals only, we can 
put all the remaining N — Ug individuals into the same class. Let us denote with the symbol "1" the individuals of 
the best performance class and with the symbol "0" all the others. We will denote by n'g the number of individuals 
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in the highest performance class in the new population, n'^ will be determined by the number of pairs 11, 10 and 
00 that we will get extracting random pairs without replacement from the old population. Let us denote by k the 
number of 11 pairs, by I the number of 10 pairs and by m the number of 00 pairs. As a consequence we will have 

ns^2k + l n', = 2{k + l) 2{k + 1 + m) = N (17) 

The probability that we get 71', = 2(fc + I) individuals into the best class when applying the selection step to a 
population with Ug = 2k + I individuals into the best class, is given by the probability that we extract k 11 pairs, I 
10 pairs and m 00 pairs from a set containing 2k + I ones and I + 2m zeroes. This probability is given by; 

k I m 

Indeed, the 2' term keeps into account that the I pairs 10 can be obtained extracting the 1 before the or vice versa. 
The term 

k + I + m 
k 

gives the number of possible distributions of the k 11 pairs inside the k + I + m total pairs. The term 

I + m 
I 

gives the number of possible distributions of the / 10 pairs inside the remaining / + m pairs. Finally, 

2{k + l + m) 
2k + I 

is the number of possible distributions of the 2fc + / 1 symbols in the 2{k + I + m) = N possible places. 
Let us notice that 

• n'g is always even. 

If we fix Us and n'^ satisfying the above constraints, we can obtain k and I as a function of rig and n'/. 

2ns - ri', , 

L — — 



2 

Since the total population is fixed to N we have: 

N ~ n' 

2{k + l + m) = N =^ m = ^-^ 

Hence, the probability of getting n'^ individuals into the best performance class after applying the selection proce- 
dure to a population with individuals into the best performance class is: 

if 71 < Us or n'^ > min(27T,s, N) 
if 71; odd 

Pripin-s ^ ri's) ^ { / N \ / N-2n,+n' 



The mutation step Let us introduce mutation into the model. We will follow to use the two simplifying assump- 
tions that we used for the deterministic model, namely: 

1. TMs in the best performance class have the same number of coding triplets N^. 

2. Mutations in coding triplets are (almost) always deleterious. 
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By definition, mutations in non-coding triplets are neutral. 

Under this assumptions if n'^ is the number of best individuals before mutation, the probability of getting 
n" ~ n'g — k individuals after the mutation step is given by 

p™«tK ^ < = - fc) = ( ) p\i - p)<-\ 

where we denoted by P the probabihty that an individual in the best performance class will undergo at least one 
mutation into a coding triplet. 



The Markov matrix If the total population N is finite, under our assumption the best individuals will always 
go extinct in a finite time. The expected number of generations r before it happens can be computed using the 
Markov matrix M of the process ll30l . The entries A/y of the Markov matrix give the probability that the system 
under scrutiny pass from its ith state to the jth one. In our case the state of the system is labeled by the number ng 
of individuals into the best performance class and the entries of A/ will be given by 

JV 

-Mne+i,n'/+i = X! ^rip{ns ^ n'JPjnut{n'g n'!.), ns,n'^ = 0,...,N 

The state n" = will be an absorbing state for M and the procedure to compute the expected number of gener- 
ations T for reaching it, works as follows. Let S be the matrix that one obtains by removing the first row and the 
first column corresponding to the only absorbing state and let c be a A^— dimensional vector whose entries are all 
one. The matrix 1 — 5, where I denotes the identity matrix, is invertible. If the Markov process begins in the state 
i, then the expected number of generations before extinction will be given by: 

T=[iI-S)-'c]^. (18) 

Let us stress that the equation (fTsT l is obtained by assuming that the system evolves for an infinite number of 
generations. The expected extinction times versus the number of coding triplets are plotted in figure [12] for 5 
different values of the mutation probability. 



2.5 Simulations settings 

In this subsection we introduce the parameter values that we adopted in our computer simulations. 

We chose the goal tape containing the binary expression of the decimal part of tt (the dots are just a useful 
separator): 

0010010000.1111110110.1010100010.0010000101.1010001100.0010001101.0011000100.1100011001. 
1000101000.1011100000.0011011100.0001110011.0100010010.1001000000.1001001110.0000100010. 
0010100110.0111110011.0001110100.0000001000.0010111011.1110101001.1000111011.0001001110. 
0110110010.0010010100.0101001010.0000100001.1110011000.1110001101. 

As a consequence, the maximum possible performance value is 125. We performed simulations with the following 
(approximate) values of the states-increase rate p; and point mutation probability p^^: 

Pi e {9.26 • 10"^ ; 1.66 • 10"^ ; 3.00 • lO"** ; 5.40 • 10"^ ; 9.72 • 10"* ; 1.75 • 10"^ ; 
3.14 • 10"^ ; 5.68 • 10"^ ; 1.02 • 10"^ ; 1.85 • 10"^ ; 3.33 • 10"^; 6.00 • 10"^ ; 
1.08 • 10-^ ; 1.95 • 10-^ ; 3.51 • lO^^ ; 6.33 • 10"^ ; 1.14} . 
Pn, e {4.91 • 10"^ ; 8.10 • 10-5 ; 1.34 • lO"'' ; 2.21 • 10""^ ; 3.64 • 10"'* ; 6.01 • 10"'' ; 
9.91 • 10"* ; 1.64 • 10"^ ; 2.70 • 10"^ ; 4.44 • 10"^ ; 7.35 • 10"^ } . 

These values have been chosen in such a way that consecutive ones have a constant ratio. For any pair of values 
Pi,Pm, we performed 20 simulations varying the initial seed of the C native random number generator, for a total 
of 3740 runs. Each simulation lasted 50000 generations. 
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3 Results 



3.1 Performance, coding triplets and mutation probabilities 

In this subsection we analyze how the performance and the number of coding triplets of the best performing 
machines vary with the different values of the mutation and states-increase rates. 

In figure Q we plot the best performance value obtained in the population at the last generation (averaged on 
the different choices of the seed) versus the state-increase rate pi and the mutation probability The maximum 
performance value of 50.6 is obtained for the maximum value of pi, pi ~ 1.14 (see figure|2lc) and an intermediate 
value of Ptn, Pm — 3.64 • 10^'* (see figure|2ld). In figure[8] we show the number of coding triplets (averaged 
on the best performing machines at the last generation and on the seeds), versus pi and p^. Again the maximum 
value TVc = 333.9 is obtained for exactly the same values of p; ~ 1.14 and p^ ~ 3.64 • 10^**. This fact and the 
similarity between figureQa and[8]suggest a strong correlation between the performance and the number of coding 
triplets. Indeed, the correlation coefficient between them is r = 0.95 (see also figures l5ll9l[T0b. The fact that the 
maximum performance occurs for an intermediate value of p^ is also partially due to this correlation. Indeed, if the 
mutation probability is too low, there is no enough variability among the TMs for selection to work on, while when 
the mutation probability is too high, the error threshold exerts a strong limiting action on the maximum number of 
coding triplets. This latter effect is clearly visible in figures |5] and |6] both taken after 50000 generations. Indeed, 
in figure |5] the abscissa positions seldom exceed the corresponding vertical lines at N*. The presence of the error 
threshold also affects the trend of the performance with the generations. Indeed, when both pm and p; are large, the 
TMs approach very early the maximum number of coding triplets. From that moment on, further accumulation of 
coding triplets is strongly opposed by mutation and selection. This leads to a saturation in the performance and in 
the number of coding triplets that is clearly visible in the plateau of figure |9] (b) and (d). Notice that this plateau 
effect is not present when pi is small (figure|9] (a) and (c)). This behaviour suggests that an adaptative choice for 
the mutation probability could maximize the speed of evolution. Indeed, one could start with an high mutation rate 
in the first generations to increase the variability, progressively diminishing it when the number of coding triplets 
increase to reduce the limiting effect due to the error threshold. We presented a proposal for the optimal adaptative 
mutation probability for this model in BDl . 

Figure [TOl shows, on a log-log scale, the relation between and pi. The straight line of linear regression has 
been evaluated in the range pi < 3.33 • 10^^. The reason is that this range corresponds to the one considered in IT], 
allowing us to compare the two results. Moreover, it is clear that the linear regime does not hold for large values 
of Pi, for which we observe a saturation effect. The regression gives the relation 

TVc = 7.3 • 10^ present simulations, 

TVc = 2.5 • 10^ paper m, (19) 

both exponents being close to i. Now, if iVt is the total number of states, its expected value is 

TV, = 50000- Pi + 1 ^ P'^--^^' (2^^ 

(this is true in absence of selection but, as discussed in HJ, it remains approximately true even with selection, 
except for very small values of pi) so that 

^ oc ^ (21) 

This means that the fraction of coding triplets on the total will decrease when p\ increases (strictly speaking, this 
analysis holds only for the linear regime however it is clear that, for larger values of pi, the plateau of figure [TOl 
corresponds to an amplification of this effect). 

As in our previous paper fl\, we observe that the maximum performance is obtained for the maximum value 
of Pi. However, this time, the trend of the performance with pi is not strictly monotonically increasing, since, as 
shown in figure |7]c, there is a plateau for high values of pi. Notice that this plateau corresponds exactly to the 
region in figure [TO] where the number of coding triplets reaches a saturation, so that this effect also is due to the 
presence of the error threshold. For various reasons, explained in (JJ, we believe that the performance should 
exhibit a maximum for a finite value of pi (this is one of the reasons that led us to increase upward the range of 
variation of pi). Unfortunately, it seems that if this maximum exists, it lies outside of the range of values of pi that 
we selected. Finally, it is interesting to compare figure Qc restricted to the range of pi values considered in HI 
with the figure 3.c of HI, that we reproduce here (see figure [TTTi. Despite the fact that we changed the number of 
generations (they were 200000 in HI), the way the head can move on the tape (in ||T| it could also stay still) and the 
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topology of the tape (in |[T1 it was not periodic), the two profiles are very similar This means that the dependence 
of the performance on pi is very robust in this model. That's not the case of the dependence of the performance on 
Pm, that is influenced, for example, by the choice of the number of generations. 

The trend of the performance with pi is interesting because it suggests that, in our model, the presence of 
inactive and free to mutate code strongly improves the evolvability of our populations. 

3.2 Extinction times 

In figure [12] we plot the base 10 logarithm of t, the expected number of generations before extinction given by 
equation (fTST l versus the number of coding triplets (red line), and superimpose the data obtained from our simula- 
tions (blue points). The red line is obtained numerically starting with a unique individual in the best performance 
class (i = 1 in (fTsT l) through the Markov matrix of the process, as explained in the previous section. The blue 
points give the observed value of r averaged over bins through the following procedure. First, the whole range of 
the number of coding triplets is divided into bins. The size of the bin is different for the different values of the mu- 
tation probability p,n and is given by the smallest integer greater than or equal to the critical value for the number 
of coding triplets (see eq. ( fTSl l) divided by 40. It is necessary to introduce bins because otherwise, especially when 
N* is large, there are too few extinction events associated with any value of the coding states to give raise to an 
also minimal statistics. On the other hand, one has to avoid that the bin size is so large that the expected number 
of generations before extinction varies considerably inside it. It seems that dividing N* by 40 is a good choice for 
the bin size. 

Now, let us suppose that at a given generation r', the data register a drop in the maximum performance value, 
then we go back to the generation t when this performance value appeared and we count the number of TMs 
scored with it. If this number is exactly two, then we register the extinction time r' — r and increment the number 
of extinction events registered in the bin containing the number of coding states of the two TMs at the generation 
r. The discrepancy between the fact that we compare the extinction data obtained starting with two individuals in 
the best performance class with those obtained from the Markov process starting with a unique individual is due to 
the fact that our program registers the data after the selection step, when the best performing individual has already 
made a copy of itself. We are neglecting the quite improbable case that after mutation two new best performing 
individuals (with the same performance) do emerge and they are extracted as a pair in the subsequent selection 
step. If the number of extinction events registered for a bin is greater than or equal to 5, then a blue point is plotted 
with an x coordinate equal to the center of the bin and an y coordinate equal to the mean of all the registered times 
of extinction. The requirement to have at least 5 extinction events in each bin is due to the fact that the expected 
number of generations before extinction corresponds to the mean over an infinite number of extinction events. 
This pushes us to select a minimum number of extinction events as large as possible, to reduce the stochastic noise. 
On the other hand, if we choose a too large number, then we get too few points from our data. Again, to fix the 
minimum number of extinction events per bin to 5 seemed to us a reasonable compromise. 

The figure [T2]corresponds to the 5 largest values of the mutation probability p^ considered in the simulations. 
For smaller values of p^, too few "experimental" points are obtained. We see that the agreement between the 
theoretical model and the simulation data is extremely good from large values of to the peak of the blue points, 
that occurs when the expected number of generations before extinction is near 100. On the left of such peak, the 
agreement is completely lost and a peculiar monotonic growth of r appears instead. The main reason is that we 
run our simulations for 50000 generations, while the theoretical model assumes an infinite number of them. This 
implies that the agreement between the data and the theoretical model will be good until when 50000 generations is 
a good approximation to 00, that is when 50000 generations is much larger than the expected number of generations 
before extinction. Clearly, when this latter number increases, the approximation is doomed to worsen. Indeed, 
when the theoretical model predicts that the expected number of generations before extinction is larger than 50000 
(around y = 4.7 in our graphs), the agreement between the model and the simulations is impossible. 

We suggest two possible mechanisms to explain why in the region where there is no agreement with the 
theoretical model, the observed extinction times increase while increasing the number of coding triplets. The 
first mechanism is that, in this region, there is a relatively high probability of extinction when the best TMs have 
just emerged. Indeed, we know that at the beginning there will be only two TMs in the best performance class 
(otherwise we do not register the data, as we explained above). If both TMs undergo a mutation (in a coding triplet) 
in the next mutation phase, then they will most probably go extinct. On the other hand, if they start to spread into 
the population, then extinction becomes more and more improbable and, on consequence, the time to wait to 
observe it largely increases. In these cases, the cut to 50000 generations will throw away a considerable portion 
of the extinction probability distribution, with the effect of amplifying the weight of the probabilities before that 
generation. This effect is clearly visible in figure[T3]where we plotted for p^ ~ AAA ■ 10^^ and five different values 
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of Nc, the extinction probabiUty distribution renormalized to 1 in the range between 1 and 32768 generations (this 
number is dictated by computational reasons). We see that the relative probability of observing an extinction event 
in the first few generations decreases with N^^ below N* and increases after, in accordance with figure [T2l 

Another mechanism can amplify this effect. Indeed, if the TMs extinction time is large, the probability that an 
increase in the performance value will occur does also increase. In such a case, the extinction of the original TMs 
simply will not be registered, creating a bias toward short extinction times. 

3.3 The route to the error threshold through punctuated equiUbria 

Let us notice that all the possible output tapes (and consequently all the possible performance scores) can be 
obtained with a 300 states TM with 300 coding triplets and running for 300 time steps. Indeed, let us denote with 
o the desired output tape and with Oi the entry of its ith cell, then the following 300 states TM will produce it: 





i 






300 





— Right — i+1 


i= 1,...,299, 





O300 - * - Halt 


1 






1 





Here the * symbol means that the corresponding entry of the state is irrelevant. Notice also that this is only a 
possible solution and, quite probably, not the shortest one. 

Figure |5] shows that, for some of the values of the mutation probabilities, TMs can indeed accumulate 300 
coding triplets, or even more, during the 50000 generations. So, the TMs could, in principle, attain the maximum 
possible performance value of 125. However, the actual maximum performance value obtained in the 3740 runs 
of our simulations is 70, quite far from the theoretical maximum. This is due to the fact that TMs do not optimize 
the use of coding triplets as it is apparent from figure |5] Indeed, if we consider the various TMs that obtain a 
performance value of 47 (for example), we see that the number of coding triplets spans a range from 131 to 443. It 
is worth noticing that by diminishing the mutation probability, the number of coding triplets tends to spread and to 
shift toward larger values. Moreover, even if the TMs use many more coding triplets than strictly necessary, we see 
from figure |9] that once they approach the error threshold, the performance growth with generations slows down 
considerably, while the number of coding triplets remains practically constant. In this way, an "historical" factor 
is introduced into the evolutionary dynamics; namely, once the TMs have wasted their coding triplets, they need a 
very large amount of time to achieve a more efficient usage. 

The mathematical model developed in the Methods showed that under the following hypotheses 

1. Q, < IVi, 

2. Qi is a monotonically decreasing function of i, 

the mutation-selection dynamics will always decrease the fidelity rates of the evolving organisms (in our case the 
TMs), until they reach the error threshold Q or the highest performance class M. All of these hypotheses do hold 
true for our evolutionary model. Indeed the first one is implied by the definition ( fT4] i of the fidelity rate for the 
TMs, that was also used to calculate the extinction times. From (fT4l) and from the fact that the performance and 
the number of coding triplets are positively correlated it follows that, on average, the fidelity rate decreases while 
the performance increases. The second hypothesis corresponds to the deterministic limit of this effect. 

The third hypothesis states that it is always possible to increase by one the performance through mutations. 
From the simulations we see (figure |9|l that the performance grows almost linearly with the generations until 
approaching the critical number of coding triplets, thus supporting this hypothesis. 

From a theoretical point of view, a performance increase of one can be obtained in our model in the following 
way. Let us suppose that the TM stops before reaching the 4000 maximum time steps (entering into the halt state). 
Let d be the distance on the output tape between the head position after the TM stopped and the nearest cell that 
would improve the performance score if its value would be changed. The TM can then increase its performance 
by one by adding d further coding triplets that move the machine head on the desired cell (without altering the 
intermediate cells) and change its value. What it is important is that the probability for this process to happen 
is not exceedingly small, so that performance increases can be observed within the generation range. The above 
mechanism does not work for non-halting machines. Indeed, since the maximum observed number of coding 
triplets is less than 600, non-halting machines have to use several times some subset of them. If we introduce a 
mutation inside a coding triplet belonging to this subset, with the aim of modifying the dynamics at a given time 
step, we cannot predict how it will affect the earlier dynamics (notice that, in the previous case, we are sure that 
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the triplet calling the Halt state is executed only once). So, in the non-halting case, we cannot see any simple 
recipe to increase the performance. In general, mutations inside the above subset of coding triplets will probably 
result in a big change in the output tape and will be almost always discarded by selection. This leads to the fact 
that non-halting machines need longer times to improve their performance. We analyzed the 3740 best performing 
machines that we registered at the end of the 50000th generation and we found that 2730 stop by calling the halt 
state, while the remaining 1010 stop by exhausting the 4000 time steps. The former TMs have a better average 
performance (19.57) compared to the latter ones (11.95). We found that, on average, the distance d is 5.99 for the 
halting TMs and is 2.47 for the non-halting TMs. As a reference value, the average distance of two adjacent ones 
on the output tape is 2.44. The fact that non-halting machines have, on average, lower performance and d values is 
consistent with the above analysis. 

Since in our simulations no TM reached the maximum performance of 125, our deterministic model predicts 
that they will accumulate coding triplets until reaching the error threshold. However, this model assumes an infinite 
number of generations, while our simulations last 50000. We can see from figure|9]c and|9]d that, when far from 
the critical number of coding triplets, the TMs do indeed accumulate coding triplets in a steady way, that depends 
on the values of and pi. While for high values of these probabilities, 50000 generations are enough to reach 
the critical number of coding triplets, they are not for low ones (see figures |6]|9]c,|9]d and|5]l. We conclude that 
our evolutionary model gives a working example of the dynamical behaviour predicted by the deterministic model 
described in the previous section. 

In figure [T4]a we show the time evolution of the higher performance score for all values of the state-increase 
probability pi corresponding to a particular choice of the seed of the random number generator and p^ = 3.64 • 
10^^. We observe the dynamical behaviour typical of punctuated equilibria f32l: long periods of stasis and briefs 
periods of rapid evolution. The same kind of evolution is observed also for the other choices of the seeds, so that 
we present figure[T4]a as a representative case. The apparent big jumps in the performance in figure[T4]a, as that 
from 28 to 46 in the boxed region, are simply due to the time scale used. A zoom of the boxed region (fig. [T4lb) 
shows that this big jump is really composed by 14 jumps of one point and 2 jumps of two points, occurring in a 
relatively short number of generations. The same is true also for the other big performance jumps observed in figure 
fT4la. Indeed, figure [15] shows the histogram of the number of occurrences of positive performance jumps versus 
their amplitude. Performance jumps of one point (the minimum possible value) are, by far, the most common, 
while performance jumps larger than three are extremely infrequent. There is however a single, quite amazing, 
performance jump of 12 that in figure [TSlis not visible due to the scale used. The mean positive performance jump 
in our simulations is 1.16. This value seems to us sufficiently near to the minimum possible value of 1, that we 
would say that the evolution of the performance in our model is essentially gradualistic. Let us stress that while the 
mechanism proposed in 1321 is based on a particular speciation mechanisms, in our case this dynamical behaviour 
is determined only by the form of the performance landscape. 

The mean performance jumps for different values of and pi fluctuate between 1.00 and 1.27, the largest 
value appearing for pn, ~ 1.64 • lO"'' and pi ~ 3.51-10"^. Notice that this probability values do not coincide with 
those associated with the largest performance, namely p^ ~ 3.64 • 10^'*, pi ~ 1.14, whose corresponding mean 
performance jump is 1.13, that is lower than average. 

For the value of the mutation probability considered in figure [14] the TMs stays far from the error threshold 
and we see the typical increasing trend of performance with generations. In figure[T6]we present the same graph 
but for a much higher value of the mutation probability pm- In this case, for some values of pi, the TMs do indeed 
reach the error threshold. From that moment on a typical oscillatory behaviour around a base performance value 
emerges. There are many performance increases followed by a rapid extinction and also performance decreases 
followed in a short time by back mutations. 

4 Discussion 

In this paper we studied, through computer simulations and mathematical modeling, the dynamics of an evolu- 
tionary model for Turing machines. In the mathematical models, by imposing suitable hypotheses on the impact 
of mutations on the performance landscape, we were able to compute the value of the error threshold and the 
expected extinction times for Turing machines versus the mutation rate. The agreement between theoretical and 
simulation data (see fig. \T7\ prove that the hypotheses we made are accurate for our model. Our main finding is that 
evolution pushes the TMs towards the error threshold. Again, we substantiated this finding through mathematical 
analysis, by showing that this behaviour is due to the mutation and selection mechanisms used and on some further 
hypotheses related to the structure of the performance landscape. Consequently, the question of the similarity be- 
tween TMs and biological organisms is irrelevant to address the problem of the biological relevance of this finding. 
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What is really relevant is the biological plausibility of the mutation-selection mechanisms and of the hypotheses 
employed. Let us stress that, despite this fact, our model still has to be considered as a toy model of evolution, 
so that it contains simplifying (hence necessarily unrealistic) hypotheses that makes it mathematically affordable. 
Nevertheless, toy models can give valuable suggestions on mechanisms working also in the full, non-simplified 
system of which they are approximations. In the following we will try to discuss the hypotheses we made from a 
biological point of view. 

The main and most relevant approximation is to consider a unique and fixed performance landscape. In nature, 
the existence of different ecological niches, the changing environment and the coevolution with other species give 
raise to multiple and ever changing fitness landscapes. While it is difficult to estimate how this approximation 
influences our conclusion, it is worth noticing that an ever changing performance landscape makes a perfectly 
fit organism substantially unattainable. By ruling out one of the possible end points of evolution, this fact could 
reinforce our results. 

The deterministic mutation-selection model that we used is completely specified by the selection mechanism 
and by the choices of the parameters Qj, j — 1, . . . , M, gij, i,j = 1, • . • , M and /. The values of Qj and gij 
are related to the mutation mechanisms and to the genotype performance mapping, while / specifies, through 
the tournament selection, the performance — > fitness mapping. Let us first discuss the selection mechanism and 
the choice of /. First of all, the selection mechanism keeps the total population N constant (soft selection). This 
is a frequent assumption in population genetic models (for instance it is used in the Wright-Fisher model and in 
the Eigen model). From a biological point of view it translates in assuming that the population fecundity is always 
enough to keep it to the (constant) carrying capacity N of the environment. The fact that only two individuals 
are compared at each generation is clearly unrealistic from a biological point of view. However, if one considers a 
large number of generations, virtually all individuals will have interacted through this pair interactions, so we think 
that a more realistic interaction mechanism would not alter the conclusions. Also the assumption that the fitness 
difference / depends on the performance values of the two individuals only through the signum of their difference 
is not realistic. However, since the conclusion that the population will eventually reach the error threshold holds for 
any / > 0, a more reaUstic choice would not change it, but only affect the population distribution in performance 
classes near the error threshold. 

Regarding the mutation mechanism, we needed three basic assumptions: 

1 . there is no perfect replicator, 

2. the fidelity rates and the performance classes are negatively correlated, 

3. the probability of improving the performance is never exceedingly small. 

The first and the third hypotheses seems to us perfectly acceptable from a biological point of view. The second 
assumption deserves a deeper discussion. It can be interpreted as saying that the performance of an individual is 
incremented mainly through the addition of coding DNA (here we use "coding" in the same informatic/algorithmic 
sense that we used for our TMs model, to indicate parts of the genome that influence the phenotype; the usual 
biological meaning would be to indicate protein coding sequences), and that this addition increases the probability 
of undergoing a non-neutral mutation. 

The latter statement is quite natural: if, for example, an organism increases its performance by converting a 
piece of junk DNA into a new gene (the metabolic cost should be kept into account to evaluate if there is a real 
performance increase), all the mutations that inactivate the new gene will be new non-neutral mutations. According 
to the first part, one has to assume that organism improve their adaptation more by increasing their coding DNA 
than by reorganizing it. 

We suggested that the fact that the mutation-selection dynamics pushes the evolving organisms towards the 
error threshold is due to quite mild hypotheses and could be a quite robust property of evolutionary systems. 
Indeed, the same phenomenon appears in another artificial evolution experiment ITj, where the codification of the 
evolving algorithms and the mutation and selection mechanisms used are completely different from ours. 

At the biological level, RNA viruses have error rates (per genome per replication) near to one and have been 
suggested to replicate near the error threshold (see for example 1251 and the references therein), in accordance 
with the behaviour that we observe in our model. Indeed, these organisms lack proof-reading mechanisms, so 
that their mutation rates per nucleotide are quite large. Moreover, the necessity to escape the immune response 
produces a selective pressure toward an high variability. Notice, however, that this latter effect is not present 
in our evolutionary model, since we considered a static performance landscape. For what concerns DNA based 
organisms, they also have remarkably small variations in their mutation rates (331. However, their mutation rates 
are much smaller than those of RNA viruses, of the order of 1 /300 per genome per replication. In 11251 it has been 
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suggested that these almost constant mutation rates could be due to the fact that also DNA based organisms do 
reproduce near the error threshold. Their higher fidelity rate could be explained by two factors: a larger number of 
neutrals in DNA sequences and the dissymmetry between the error rates of the two daughter DNA double strands 
(see II25I ). While our results could encourage this explanation, the development of error correction mechanisms 
is not considered in our model. Approaching the error threshold surely induces an high selective pressure on the 
development of error correction mechanisms. The short term advantage derived by an increase in the reproductive 
fidelity could be reinforced by the long term advantage of an higher evolvability (if the organisms evolvabilities 
are much lower near the error threshold as it happens in our model, see figure |9]l. Maybe, the mutation rates 
observed for the DNA could be due to a balance between the natural trend toward the error threshold, due to the 
mutation-selection dynamics, the need of proof-reading mechanisms and their metabolic costs. 

In this paper, we showed that some of the features observed in an artificial evolutionary model can have a much 
more general validity than the specific model itself. This happens when the phenomenon under consideration is 
mainly due to the mutation-selection dynamics, so that it can be described through a population genetic model. It 
seems to us that this synergistic integration between artificial evolution and population genetic model should be 
pursued, when possible. Since the phenomena observed in an artificial evolutionary model can have a quite wide 
degree of generality, we think that they can give interesting suggestions on possible evolutionary mechanisms 
working also at the biological level. 
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Figures 




m 

Figure 1. Graphical representation of a Turing machine. The machine is shown at time t, in the 
internal state s{t), located on the fc(t)-th cell of a infinite tape. 




N 



Figure 2. Stable state for the occupation number of the highest occupied performance class. 

The blue curve represents the function n'^{ns) defined by ([8]l, while the red line corresponds to 
n" = Us- The green points represent 15 iterates of the discrete map ([8]) starting by the initial datum 
n. The asymptotic value of the map (O will be ni^'^ for any initial datum n ^ 0. 
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Figure 3. Results of the numerical simulation described in section "The deterministic model". 

The green points show the occupation numbers of the 40 performance classes obtained after 2 • 10^ 
generations while the red line connect those obtained after 10^ generations. The prediction in ( fTOl 
[T3] ) for the best occupied performance class is s=21. 
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Figure 4. Histogram of the distribution of a/Nc for the 3740 runs of our simulations. Nc is 

the average number of coding triplets for the best individuals in the last generation and a is the 
corresponding standard deviation. The size of the bins is 0.0025. 
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Figure 5. Performance versus the number of coding triplets. The performance is shown for 
the best performing TMs at generation 50000 for the 3740 runs of our simulations. Each color 
corresponds to a different value of the mutation probability as indicated in the scale under the image. 
The dashed lines correspond to the critical number of coding triplets for the 6 highest value of the 
mutation probabilities. For lower values the corresponding critical number of coding triplets lies 
outside of the graph. Notice that many points are superimposed. 




Figure 6. Plot of the number of coding triplets for the best machine in the population. The 

number of coding triplets after 50000 generations, averaged on the seeds, is shown as a function of 
Pm, for all the values of pi. The black thick line on the right represents the critical number of coding 
triplets, according to equation (fT6b . 
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Figure 7. Best performance value in the population at the last generation. The best performance 
is averaged on the twenty different seeds and plotted as a function of the states-increase rate pi and 
of the mutation rate pm- (a) shows a 3D view, while subfigures (b), (c), (d) correspond to the three 
orthogonal projections. 
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Figure 8. Number of coding triplets in the population at tlie last generation. The number of 
coding triplets is averaged on the best machines and on the seeds; it is plotted versus pi (right) and 
Pm (left). 
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Figure 9. Data along the generations. Here we show the evolution of the performance (top) and 
of the number of coding triplets (bottom) with the generations, for the values of pi indicated by the 
matching colours and for two values of p^. In (d), the dashed line represents the maximal number 
of coding triplets (fTSI l. In (c), the corresponding line is outside the graph, being N* = 1047.0. Data 
are sampled every 100 generations and averaged on the seeds. 
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Figure 10. Correlation of the mean number of coding triplets Nc versus the states-increase 
probability p\. For each value of pi, only the four best values of the final performance (at four 
different pm) are retained for the evaluation of Nc, for each seed. The green straight line of linear 
regression is evaluated on the range pi < 3.33 • 10^2 only, in order to compare with figure 7 of Ul. 
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Figure 11. Comparison between actual and previous data. The subfigure (a) corresponds to 
figure 4.C restricted to the range of pi values considered in Subfigure (b) is the same as (a) but 
for the data obtained in |[l]. 
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Figure 12. Extinction times vs mutation probabilities. Logarithm of the theoretical (continuous 
line) and observed (points) extinction time logj^Q r versus the number of coding triplets for the 
indicated mutation probabilities. The black vertical lines correspond to N*, the critical number of 
coding triplets of the deterministic model (eq. (fTST i). 



24 




1 23456789 10 



Figure 13. Relative extinction probabilities versus the generation number. The curves corre- 
spond to = 4.44 • 10^^, for five different values of N^. The error threshold for the given is 
N* ~ 52, represented by the blue line. Data are renormalized by setting to one the probability of 
observing an extinction event before generation 32768. 
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Figure 14. Each coloured line shows the evolution of the performance during the generations 
for a single simulation. The mutation and seed values are shown in the upper left corner, while the 
state-increase rate pi is indicated on the right by the matching colour We observe the presence of 
long stasis periods alternated by short periods of fast evolution. The small black rectangle is zoomed 
on in the right part of the figure to show the actual jumps in the performance. 
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Figure 15. Distribution of the performance jumps versus their amplitudes. This histogram 
shows the number of increases in the performance versus their ampHtude. 
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Figure 16. Performance evolution near the error threshold. Here, as in figure [141 we show 
the growth of the performance in the generations but for the much higher value of the mutation 
probability pni = 0.0044. For this value of 7?rn and certain values of pi, TMs reach the error threshold. 
From there on, a typical oscillatory pattern emerges. 
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